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Abstract. Pronunciation learners can benefit from peer feedback in a Computer- 
mediated Communication (CMC) environment that allows them to notice segmentals 
and suprasegmentals. This paper explores the intelligibility judgments of same-L1 
peers using P-Check (Version2, https://ver2.jp), a Learning Management System 
(LMS) plug-in that aggregates peer feedback on local intelligibility (Munro & 
Derwing, 2015). P-Check randomly delivers written prompts for learners to record. 
Recordings are randomly delivered to peers who choose from a drop-down menu 
which utterance was perceived. Aggregated judgments from peers and from the 
instructor are displayed to learners as feedback on intelligibility. This study used 
eight segmental contrasts: /b-v/, /s-0/, /l-1/, /l-1/-clusters, /ee-a/, /a-a/, /a-ow/, and /1-1/. 
Participants (N=38) made 3,451 intelligibility judgments on 1,203 recordings. The 
effects of rater listening discrimination proficiency and of utterance intelligibility 
were examined in six contrasts using Generalized Estimating Equations (GEE). 
Results showed that intelligibility was generally a significant predictor of judgment 


accuracy, but rater listening discrimination proficiency was not. 
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1. Background 


In pronunciation learning, the effectiveness of same-L1 peer feedback in English 
as a foreign language environments has yet to be fully explored. There is some 
evidence that same-L1 learners can benefit from peer pronunciation feedback, 
especially in asynchronous CMC environments that allow repeated listening to 
recorded speech, giving learners time to notice pronunciation features (Correa 
& Grim, 2014; Gilakjani, Ahmadi, & Ahmadi, 2011). However, feedback 
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from same-L1 learners may be problematic in task-based communication due 
to learners converging on a shared non-standard pronunciation. Walker (2005) 
recommends preventing convergence through highly-controlled activities. Thus, 
the present study directs participants’ feedback to selected features by having 
them make a forced-choice judgment of local intelligibility (Munro & Derwing, 
2015). 


Pronunciation instruction would benefit from a better understanding of what 
factors underlie the accuracy of these same-L1 peer judgments of intelligibility. 
This exploratory study focuses on two aspects: the stimulus and the learner. The 
research questions are: (1) to what extent does the accuracy of local intelligibility 
judgments by same-L1 peers vary depending on the targeted phoneme and 
utterance accuracy, and (2) to what extent does it depend on rater listening 
discrimination ability? 


2. Method 


2.1. Participants 


The 38 participants (M=17, F=21) in this convenience sample were Japanese 
university students enrolled in an elective first-year practical English phonetics 
course who provided their informed consent following Teaching English as a 
Second Language (TESOL) standards. 


2.2. Classroom environment 
The language targets were eight segmental contrasts that are difficult for this learner 
population: /b-v/, /s-6/, /I-1/, /l-1/-clusters, /ze-a/, /a-a/, /a-ow/, and /i-1/. Materials 
consisted of 47 pairs of two-line contrastive conversations with L1 glosses. The 
first line of each conversation differed in one phoneme, such as: 
Conversation 1. A: He is a good leader. B: Everyone trusts him. 
Conversation 2. A: He is a good reader. B: He loves books. 
After receiving focused instruction on the targeted phoneme, participants did 


individual online listening discrimination and pronunciation practice. Due to 
time constraints, this practice was not completed for the /s-0/ or /a-ow/ contrast. 
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The learning sequence included a listening discrimination pre-test, shadowing, 
listening discrimination practice, visual input, pronunciation practice, and choral 
repetition of the contrastive conversations to familiarize participants with their 
meaning and pronunciation. Finally, participants engaged in online peer judgments 
of intelligibility for approximately 15 minutes per contrast. 


2.3. Software 


The peer judgments were conducted using P-Check, a plug-in for Glexa, a 
proprietary LMS that has been used by more than 100,000 students in over 
1,000 university courses throughout Japan. P-Check randomly presented the first 
line of one of the two conversations onscreen for the learner to record. Recordings 
were randomly delivered to peers who selected the appropriate second line of the 
conversation from a drop-down menu. After recordings received four judgments, 
they were taken out of circulation. A Native-Speaker (NS) rater also used P-Check 
to judge the intelligibility of all recordings. Peer and NS rater feedback for each 
recording was displayed to individual participants. 


2.4. Data collection and analysis 


Data were gathered from the P-Check database. Data consisted of participants’ 
intelligibility judgments (n=3,451) which were compared to the NS rater’s 
judgments of the 1,215 recordings produced by participants. 


In further analysis, the relative effects of utterance intelligibility and rater listening 
discrimination proficiency were modeled for the six contrasts listed in Table 3. 
GEE, which produces a population-average model, was used because it can 
accommodate repeated categorical outcomes while accounting for a different 
number of outcomes per participant. 


The model included one centered covariate, listening discrimination proficiency, 
and one factor, intelligibility of the utterance. The events-in-trials outcome variable 
was accurate/inaccurate judgment of intelligibility. The models used binomial 
distribution with a logit-link function, exchangeable structures, and robust standard 
errors (Heck, Thomas, & Tabata, 2012). Models were run separately for each 
contrast and those with the lowest QICC’, a criterion used in model selection, were 
chosen. An alpha level of .05 was used for all statistical tests. 


2. Quasi likelihood under independence model criterion 
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3. Initial results and discussion 


Research Question | asks to what extent the accuracy of local intelligibility 
judgments vary depending on the targeted phoneme and utterance accuracy. Table | 
suggests that participants were more likely to make accurate judgments when the 
utterance was intelligible (to the NS rater) than when it was not. 


Table 1. Accuracy of peer judgments by perceived intelligibility 

Peer judgment Accurate (%) Inaccurate (%) Total (%) 
Intelligible 1,819 (52.7) 437 (12.7) 2,256 (65.4) 
Unintelligible 460 (13.3) 735 (21.3) 1,195 (34.6) 
Total 2,279 (66.0) 1,172 (34.0) 3,451 (100) 


Of all judgments, 21.3% occurred when peers were unable to recognize an 
intelligible phoneme; these may be related to low listening discrimination ability. 
Only 12.7% of judgments involved participants judging an unintelligible phoneme 
to be intelligible; these judgments may involve using knowledge of the L1 
phonology. 


Table 2 indicates substantial variation in mean intelligibility among the contrasts. 
The least intelligible contrasts were /s-0/ and /a-ow/, which had been taught but 
not fully practiced, and /I-1/ clusters. The /z-a/ and /a-a/ contrasts were most 
intelligible and were judged most accurately. 


Table 2. Accuracy of peer intelligibility judgments by contrast 
Contrast Intelligibility (NS) Peer Judgments | Peer Accuracy 

M SD n M SD 
/b-v/ .80 A401 334 65 479 
/s-8/ 72 452 426 .66 473 
/\-3/ 73 445 342 64 480 
/\-1/ clusters | .68 466 395 59 493 
/i-1/ 13 444 480 71 453 
/a-ov/ .65 476 627 63 484 
/a-a/ .80 403 429 .70 457 
/ze-A/ .86 346 418 69 463 
Total 3,451 


Reseach Question 2 asks to what extent the accuracy of judgments depends on rater 
listening discrimination ability. This was measured for six contrasts by the listening 
discrimination pre-test at the beginning of the teaching sequence. Participants 
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heard 20 pairs of words, half minimal pairs (e.g. Jake, rake) and half tokens of the 
same word (e.g. lakel, lake2), and they marked each pair same or different. Table 3 
indicates that participants discriminated /z-a/ the best and /l-1/-clusters the least 
well. 


Table 3. Listening discrimination scores by contrast 


n Min. Max. M SD 
/b-v/ 30 11 18 14.33 1.77 
/l-a/ 37 10 19 14.05 22d 
/l-1/ clusters | 32 6 17 11.44 2.41 
/i-1/ 31 15 20 18.29 1.53 
/a-a/ 32 14 20 17.25 1.83 
/ee-A/ 31 15 20 18.97 1.25 


Results of the GEE analysis were as follows. For the /b-v/ and /I-1/-clusters 
contrast, neither intelligibility nor listening discrimination were significant 
predictors of judgment accuracy. For the remaining contrasts, parameter estimates 
showed that unintelligible utterances received accurate intelligibility judgments 
at a significantly lower rate than intelligible utterances (/l-1/: Wald y2(1)=6.054, 
p=.014, B=-.584); /i-1/: Wald y2(1)=12.388, p<.001, B=-.949; /ee-a/: Wald 
¥2(1)=11.158, p=.001, B=-.928; /a-a/: Wald y2(1)=69.707, p<.001, B=-2.979). For 
/a-a/, listening discrimination was also a predictor of judgment accuracy (Wald 
¥2(1)=5.888, p=.015, B=.141). 


4. Conclusion 


Intelligibility was a significant predictor of judgment accuracy, except for /b-v/ 
and /l-1/-clusters. Closer examination reveals further variation even within some 
contrasts. For example, /b/ had 75% judgment accuracy while /v/, a phoneme not 
in the participants’ L1 inventory, had only 56% accuracy. A different pattern was 
seen for /I/ and /1/, both of which were judged with 61% mean accuracy. However, 
the /I-1/ contrast showed strong variation at the item level, with accuracy ranging 
from 25% (long) to 91% (lamp). 


Although a detailed analysis is beyond the scope of this paper, it is clear that 
intelligibility judgments were highly sensitive to target variability. Unexpectedly, 
listening discrimination ability was found to predict judgment accuracy only for 
one contrast, possibly indicating that a more robust measure of this covariate is 
needed. 
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